CLVM enhancements and fixes#12617
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests.
Additional details and impacted files@@ Coverage Diff @@
## main #12617 +/- ##
=============================================
- Coverage 17.90% 3.68% -14.23%
=============================================
Files 5938 454 -5484
Lines 532864 38798 -494066
Branches 65192 7151 -58041
=============================================
- Hits 95392 1428 -93964
+ Misses 426793 37183 -389610
+ Partials 10679 187 -10492
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
5fc1f12 to
9e03f4b
Compare
|
@blueorangutan package |
|
@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16801 |
a08e7a5 to
df61d6f
Compare
df61d6f to
43e9384
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #12617 +/- ##
============================================
+ Coverage 18.75% 18.87% +0.12%
- Complexity 17966 18201 +235
============================================
Files 6160 6170 +10
Lines 552578 554819 +2241
Branches 67348 67717 +369
============================================
+ Hits 103660 104748 +1088
- Misses 437512 438557 +1045
- Partials 11406 11514 +108
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
3f900e8 to
c9dd7ed
Compare
|
@blueorangutan package |
|
@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16875 |
|
@blueorangutan package |
|
@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16877 |
|
@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress. |
| // Check if CLVM lock transfer is needed (even if moveVolumeNeeded is false) | ||
| // This handles the case where the volume is already on the correct storage pool | ||
| // but the VM is running on a different host, requiring only a lock transfer | ||
| boolean isClvmLockTransferNeeded = !moveVolumeNeeded && | ||
| isClvmLockTransferRequired(newVolumeOnPrimaryStorage, existingVolumeOfVm, vm); | ||
|
|
||
| if (isClvmLockTransferNeeded) { | ||
| // CLVM lock transfer - no data copy, no pool change needed | ||
| newVolumeOnPrimaryStorage = executeLightweightLockMigration( | ||
| newVolumeOnPrimaryStorage, vm, existingVolumeOfVm, | ||
| "CLVM lock transfer", "same pool to different host"); | ||
| } else if (moveVolumeNeeded) { | ||
| PrimaryDataStoreInfo primaryStore = (PrimaryDataStoreInfo)newVolumeOnPrimaryStorage.getDataStore(); | ||
| if (primaryStore.isLocal()) { | ||
| throw new CloudRuntimeException( | ||
| "Failed to attach local data volume " + volumeToAttach.getName() + " to VM " + vm.getDisplayName() + " as migration of local data volume is not allowed"); | ||
| } | ||
| StoragePoolVO vmRootVolumePool = _storagePoolDao.findById(existingVolumeOfVm.getPoolId()); | ||
|
|
||
| try { | ||
| HypervisorType volumeToAttachHyperType = _volsDao.getHypervisorType(volumeToAttach.getId()); | ||
| newVolumeOnPrimaryStorage = _volumeMgr.moveVolume(newVolumeOnPrimaryStorage, vmRootVolumePool.getDataCenterId(), vmRootVolumePool.getPodId(), vmRootVolumePool.getClusterId(), | ||
| volumeToAttachHyperType); | ||
| } catch (ConcurrentOperationException | StorageUnavailableException e) { | ||
| logger.debug("move volume failed", e); | ||
| throw new CloudRuntimeException("move volume failed", e); | ||
| boolean isClvmLightweightMigration = isClvmLightweightMigrationNeeded( | ||
| newVolumeOnPrimaryStorage, existingVolumeOfVm, vm); | ||
|
|
||
| if (isClvmLightweightMigration) { | ||
| newVolumeOnPrimaryStorage = executeLightweightLockMigration( | ||
| newVolumeOnPrimaryStorage, vm, existingVolumeOfVm, | ||
| "CLVM lightweight migration", "different pools, same VG"); | ||
| } else { | ||
| StoragePoolVO vmRootVolumePool = _storagePoolDao.findById(existingVolumeOfVm.getPoolId()); | ||
|
|
||
| try { | ||
| HypervisorType volumeToAttachHyperType = _volsDao.getHypervisorType(volumeToAttach.getId()); | ||
| newVolumeOnPrimaryStorage = _volumeMgr.moveVolume(newVolumeOnPrimaryStorage, vmRootVolumePool.getDataCenterId(), vmRootVolumePool.getPodId(), vmRootVolumePool.getClusterId(), | ||
| volumeToAttachHyperType); | ||
| } catch (ConcurrentOperationException | StorageUnavailableException e) { | ||
| logger.debug("move volume failed", e); | ||
| throw new CloudRuntimeException("move volume failed", e); | ||
| } | ||
| } |
There was a problem hiding this comment.
Can this volume move section be extracted to a separate method to reduce indentation?
There was a problem hiding this comment.
Would it be ok if I addressed this in another PR - with other enhancements that are yet to come.
There was a problem hiding this comment.
This should be a straightfoward refactor. I would prefer having it on this one already, but its up to you.
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 18208 |
|
[SF] Trillian test result (tid-16265)
|
…ures, change the clvm vols to exclusive on source from shared, and on success, change dest vol to exclusive only for cross-pool migration
|
@blueorangutan package |
|
@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 18209 |
|
@blueorangutan test |
|
@Pearl1594 a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
| Answer answer = agentManager.send(hostId, | ||
| new ClvmLockTransferCommand(operation, lvPath, volumeInfo.getUuid())); | ||
| if (answer == null || !answer.getResult()) { | ||
| String details = answer != null ? answer.getDetails() : "null answer"; | ||
| logger.warn("CLVM lock command [{}] failed for LV [{}] on host [{}]: {}", operation, lvPath, hostId, details); |
There was a problem hiding this comment.
What is the consequence of improper locking? If we get a false answer here we just continue, what happens then?
If there is no consequence, why are we locking and unlocking?
There was a problem hiding this comment.
The locking here is not about making the device accessible for the migration itself, that's handled earlier (pre-migration ACTIVATE_SHARED, which hard-fails). These post-migration ACTIVATE_EXCLUSIVE calls are state normalization.
|
@blueorangutan package |
|
@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress. |
| // Check if CLVM lock transfer is needed (even if moveVolumeNeeded is false) | ||
| // This handles the case where the volume is already on the correct storage pool | ||
| // but the VM is running on a different host, requiring only a lock transfer | ||
| boolean isClvmLockTransferNeeded = !moveVolumeNeeded && | ||
| isClvmLockTransferRequired(newVolumeOnPrimaryStorage, existingVolumeOfVm, vm); | ||
|
|
||
| if (isClvmLockTransferNeeded) { | ||
| // CLVM lock transfer - no data copy, no pool change needed | ||
| newVolumeOnPrimaryStorage = executeLightweightLockMigration( | ||
| newVolumeOnPrimaryStorage, vm, existingVolumeOfVm, | ||
| "CLVM lock transfer", "same pool to different host"); | ||
| } else if (moveVolumeNeeded) { | ||
| PrimaryDataStoreInfo primaryStore = (PrimaryDataStoreInfo)newVolumeOnPrimaryStorage.getDataStore(); | ||
| if (primaryStore.isLocal()) { | ||
| throw new CloudRuntimeException( | ||
| "Failed to attach local data volume " + volumeToAttach.getName() + " to VM " + vm.getDisplayName() + " as migration of local data volume is not allowed"); | ||
| } | ||
| StoragePoolVO vmRootVolumePool = _storagePoolDao.findById(existingVolumeOfVm.getPoolId()); | ||
|
|
||
| try { | ||
| HypervisorType volumeToAttachHyperType = _volsDao.getHypervisorType(volumeToAttach.getId()); | ||
| newVolumeOnPrimaryStorage = _volumeMgr.moveVolume(newVolumeOnPrimaryStorage, vmRootVolumePool.getDataCenterId(), vmRootVolumePool.getPodId(), vmRootVolumePool.getClusterId(), | ||
| volumeToAttachHyperType); | ||
| } catch (ConcurrentOperationException | StorageUnavailableException e) { | ||
| logger.debug("move volume failed", e); | ||
| throw new CloudRuntimeException("move volume failed", e); | ||
| boolean isClvmLightweightMigration = isClvmLightweightMigrationNeeded( | ||
| newVolumeOnPrimaryStorage, existingVolumeOfVm, vm); | ||
|
|
||
| if (isClvmLightweightMigration) { | ||
| newVolumeOnPrimaryStorage = executeLightweightLockMigration( | ||
| newVolumeOnPrimaryStorage, vm, existingVolumeOfVm, | ||
| "CLVM lightweight migration", "different pools, same VG"); | ||
| } else { | ||
| StoragePoolVO vmRootVolumePool = _storagePoolDao.findById(existingVolumeOfVm.getPoolId()); | ||
|
|
||
| try { | ||
| HypervisorType volumeToAttachHyperType = _volsDao.getHypervisorType(volumeToAttach.getId()); | ||
| newVolumeOnPrimaryStorage = _volumeMgr.moveVolume(newVolumeOnPrimaryStorage, vmRootVolumePool.getDataCenterId(), vmRootVolumePool.getPodId(), vmRootVolumePool.getClusterId(), | ||
| volumeToAttachHyperType); | ||
| } catch (ConcurrentOperationException | StorageUnavailableException e) { | ||
| logger.debug("move volume failed", e); | ||
| throw new CloudRuntimeException("move volume failed", e); | ||
| } | ||
| } |
There was a problem hiding this comment.
This should be a straightfoward refactor. I would prefer having it on this one already, but its up to you.
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 18218 |
2fb696a to
f438897
Compare
|
[SF] Trillian test result (tid-16274)
|
|
@blueorangutan package |
|
@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 18221 |
|
@blueorangutan test |
|
@Pearl1594 a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
|
|
[SF] Trillian test result (tid-16284)
|


Description
This PR enhances the existing CLVM implementation which was based on the deprecated CLVM technology which was based on corosync/pacemaker. With RHEL 7 having reached EOL, CLVM seems to be broken. CLVM supports RAW volumes on LVM , where as CLVM_NG support QCOW2 on LVM.
Further details: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Modernized+CLVM%3A+Enhancements+and+CLVM_NG+support
NOTE: On testing - it was identified that incremental snapshots for clvm-ng do not work as expected. As of now it's been removed from scope. So, CLVM and CLVM_NG would only support full snapshots.
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
How did you try to break this feature and the system with this change?